The second argument: a logical indexing vector built from a comparison operator?
But the variable OncotreeLineage does not exist in our environment!
Rather, OncotreeLineage is a column from metadata, and we are referring to it as a data variable. We can directly refer to the column vector metadata$OncotreeLineage with just OncotreeLineage.
The input arguments for select() are also data variables.
Summary statistics
Now that your dataframe has be transformed based on your scientific question, you can start doing some analysis on it!
If the columns of interest are numeric, consider functions mean(), median(), max(), on a column.
If the columns of interest is character or logical, consider table().
mean(breast_metadata$Age, na.rm =TRUE)
[1] 50.96104
table(breast_metadata$Sex)
Female Unknown
91 1
Code readability with many nested functions
When combining multiple functions in one expression, it gets harder to read: